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Abstract 

Cross-species translation of genomic information may play a pivotal role in applying biological knowledge gained from 
relatively simple model system to other less studied, but related, genomes. The information of abiotic stress (ABS)- 
responsive genes in Arabidopsis was identified and translated into the legume model system, Medicago truncatula. Various 
data resources, such as TAIR/AtGI DB, expression profiles and literatures, were used to build a genome-wide list of ABS 
genes. tBIastX/BlastP similarity search tools and manual inspection of alignments were used to identify orthologous genes 
between the two genomes. A total of 1,377 genes were finally collected and classified into 18 functional criteria of gene 
ontology (GO). The data analysis according to the expression cues showed that there was substantial level of interaction 
among three major types (i.e., drought, salinity and cold stress) of abiotic stresses. In an attempt to translate the ABS genes 
between these two species, genomic locations for each gene were mapped using an in-house-developed comparative 
analysis platform. The comparative analysis revealed that fragmental colinearity, represented by only 37 synteny blocks, 
existed between Arabidopsis and M. truncatula. Based on the combination of E-value and alignment remarks, estimated 
translation rate was 60.2% for this cross-family translation. As a prelude of the functional comparative genomic approaches, 
in-silico gene network/interactome analyses were conducted to predict key components in the ABS responses, and one of 
the sub-networks was integrated with corresponding comparative map. The results demonstrated that core members of the 
sub-network were well aligned with previously reported ABS regulatory networks. Taken together, the results indicate that 
network-based integrative approaches of comparative and functional genomics are important to interpret and translate 
genomic information for complex traits such as abiotic stresses. 
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Introduction 

Model systems may play a central role in translating genomic 
information from one species to another. A key rationale of 
sequencing the model species with enormous expenses is that 
knowledge gained from relatively small genome can be transferred 
to a related, but commonly larger and more complex, species and 
even across difiFerent families. Such cross-species translation is 
largely based on precisely identifying orthologous genes with 
shared evolutionary origins. 

In plants, Arabidopsis is the first species whose genome was fuUy 
sequenced and contains perhaps the most comprehensive context 
of genomic information than any other model systems, and 
thereby can serve as the central model system for the translational 
genomics approaches. Comparative genomics should be the most 



prevalent analysis tool used for purposes of translating genomic 
information across different species, and works based upon the 
assumption that genomic contents and orders are, to some extent, 
conserved among different, but related, species despite of various 
levels of evolutionary changes. This assumption has been vahdated 
many times with respect to conservation of genomic structures and 
gene functions between orthologous loci. Naturally, the ability to 
transfer genomic information between different species depends on 
evolutionary distances and the nature of genomic change in 
particular regions. In the past, the translation of genomic 
information had, with limited capability, been conducted using 
common RFLP probes [1] and EST-based gene markers [2,3]. 
Presently, it is becoming more direct, but even greater complex, 
due to strikingly rapid advancement of sequencing technologies 
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(i.e., next generation sequencing or NGS) and resulting availability 
of genomic information. This technological advance leads the 
genomic science to a situation where the capability of bioinfor- 
matic analysis is increasingly emphasized. 

Since the TAIR project (http://www.arabidopsis.org) first 
started in 1999, Arabidopsis has been playing a pivotal role, 
largely due to its well curated genomic information and functional 
annotation, in translational genomic studies within the Brassica- 
ceae, including a broad range of related crop species such as 
Brassica okracea (C genome, n = 9) [4], B. rapa (A genome, n= 10) 
and amphidiploid B. napus (AC genome, n= 19) [5]. Arabidopsis 
genomic information was also used to compare its genomic 
structure with other fully sequenced eudicot species including 
grape [6], poplar and papaya [7,8], and even with far more distant 
monocot species, such as rice [9] and maize [10]. It was shown 
that Arabidopsis genome had only scant synteny with relati\'ely 
distant monocot species [9,10]. However, coexpression-based 
comparative network analysis of orthologous genes predicted 
between Arabidopsis and rice showed considerable level (approx- 
imately 77%) of conser\'ed coexpression patterns [11], suggesting 
that cross-spc-cies translation of conserved functional genomic 
context would be more meaningful rather than simple structural 
genomic comparison. 

The legume family includes a broad span of species, with 
roughly 20,000 species and 700 genera [12]. The family consists 
largely of three subfamilies: the PapUionoideae, with approxi- 
mately 70% of species, which is a relatively recent-diverged group 
and includes most crop legumes; the Mimosoideae, with some 
15%; and the Caesalpionoideae, known to contain the most 
ancient legume taxa, with the remainder. The potential of 
translational genomic analyses in the legume family and its 
application for the crop improvement may be particularly 
promising than any other plant groups, because well-defined 
model systems, such as Medicago truncatula and soybean, and a large 
number of domesticated legume species are available. More 
importandy, whole genome sequencing of five legume species, 
including M. truncatula [13], Lotus japonicus [14], Glycine max [15], 
Phaseolus vulgaris (unpublished, genome data available at Phyto- 
zome; www.phytozome.net) and Cajanus cajan [16], has recentiy 
been completed. At the dawn of the whole genome sequencing 
completion of these legume species, translational genomic 
approaches were suggested for the purposes of DNA marker 
development, positional cloning followed by gene discovery, and 
other omics studies with ultimate goals of legume crop improve- 
ment [17,18]. As an example of the application, 5460 unigenes, 
which were identified as orthologs in M. truncatula, L. japonicas and 
soybean, were newly mapped on Pea (Pisum sativum) genome, and 
hence leading to comparative analysis of these four legume 
genomes [19]. 

Of these five legume species, M. truncatula should be a nodal 
legume model system that may have a potential as the translational 
interface to accelerate crop improvements in legumes, because of 
its evolutionar)' proximity and frequently found genome conser- 
vation with other economically important legumes [17]. Never- 
theless, a functional genomic translation has yet not been 
conducted between these two representative model systems, 
Arabidopsis and M. truncatula. Therefore, as the first step toward 
cross-family translation of genomic information, it is expected that 
M. truncatula, due to its relatively better established genomic data 
than any other sequenced legume species and its diploid nature, 
win make it anchoring resource to translate genomic information 
from Arabidopsis, which is the representative model system of the 
Brassicaceae, into the legumes. 



These two famili(;s, the Brassicaceae and Fabaceae, in which 
contain significant portion of agro-economically important crops, 
are known to have diverged very long time ago, approximately 
125-136 million years ago (MY A) [20], and presumably have 
experienced diverse scenario of evolutionary events (Figure 1). In 
addition to the ancient divergence time, even for model species 
with relatively small genomes, cross-species translation of genomic 
information is not malleable largely due to the complexity of 
genomic structures accumulated throughout the evolutionary 
history of given species. 

In this study, we attempted to identify abiotic stress (ABS)- 
responsive genes, mainly using Arabidopsis as a central species for 
the translation of genomic information, and to transfer the 
genomic information into M. truncatula, as a anchoring model 
system within the legume family. Here, we report corresponding 
results, including genome-wide collection of ABS-related genes, 
estimation of cross-family translation rate, identification of key 
components in ABS response and genomic locus translation 
through comparative analysis. The main goal of this study is to 
provide researchers with basic information on genome-widely 
collected ABS-related genes towards the improvement of crop 
quality in ABS tolerance in the future. 

Materials and Methods 

Genome-wide screening of ABS-related genes and 
identification of cross-family orthologous genes 

To search for the ABS-responsive genes, we employed different 
types of available data resources including literature, gene 
expression profiles, Arabidopsis database (TAIR at http:/ /www. 
arabidopsis.org) and Arabidopsis Gene Index (AtGI; http:// 
compbio.dfci.harvard. edu/tgi/cgi-bin/tgi/gimain.pl?gudb = arab). 
Genes associated with three major and the most important stresses 
(drought, salinity and cold stress) were screened in this study. In 
the events Arabidopsis DBs were used for the search, different key 
words relevant to the abiotic stresses were used to search ABS 
genes, and each of gene descriptions were investigated to confirm 
its functional relevance to the abiotic stresses. The genes collected 
from multiple data resources were combined into a single Excell 
worksheet and organized, basically, according to their locus IDs 
with other related information, such as annotations, expression 
cu(;s, references and other related information (see Table SI). 

For the first step of the translational analysis, the whole genome 
information for two model species, TAIR verlO (http://www. 
arabidopsis.org) and Mt ver3.5 (M. truncatula HapMap Project; 
http://www.medicagohapmap.org/genome), was downloaded in- 
to our own ser\'er system. To identify orthologous genes between 
M. truncatula and Arabidopsis, we used in-house batch tBlastX/ 
BlastP homolog search tools with the selected ABS-responsive gene 
set. Final decision on the orthology of pair-wise aligned genes was 
carefully made by taking integrated knowledge into consideration 
of the E-values, percent identities, cumulated total length of 
alignment and finally visual inspection for the quality of individual 
alignments. 

Genomic locus translation and comparative mapping 

Comparative genomic mapping was conducted with the 
orthologous genes predicted between Arabidopsis and M. 
truncatula. Genomic locus positioning of each gene was made by 
using fuUy sequenced genomic information of Arabidopsis and M. 
truncatula. For purposes of the comparative analysis, we developed 
an in-house platform for the comparative mapping. Accordingly, 
ABS gene database was constructed to store and use the ABS gene 
information to produce the corresponding comparative maps with 
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Figure 1. Taxonomic relationship of major crop and model species within two families, the Fabaceae and Brassicaceae. Whole 
genome-sequenced species are higiilighted in bold italic. Most crop legumes occur in galegoid and phaseoloid clades. Known divergence time is 
denoted by arrows. IVIYA, million years ago. 
doi:1 0.1 371/journal.pone.0091 721 .gOOl 



selected options, for example chromosome number selection and 
gene ontology criteria, for the analysis (the manuscript for the DB 
and analysis tool development will be prepared separately). 
Predicted orthologs were interlinked with other relevant informa- 
tion, such as functional description of genes, locus ID and 
expression cues, within the tentative orthologous gene database 
(TOGDB). To properly describe functional aspects of entire gene 
set, eighteen different gene ontology (GO) criteria were selected by 
considering appropriate level of GO hierarchies. 

Synteny blocks were determined by following criteria: (1) each 
block must contain at least more than four consecutively coUinear 
genes, (2) these collinear genes within the synteny block should 
have consistently increasing or decreasing locus IDs, (3) distance 
between neighboring genes should be less than 2Mb. Finally, the 
synteny blocks were determined by re-examining the orthologous 
relationships of genes in each block. 

Gene network and interactome analysis 

To extract functional meaning of ABS-responsive genes and to 
connect this information with comparative genomic data, we 
employed a combination of tools and database including AraNet 

(http://wv™'.fuii( ti()nalnet.org/araiict) [21] and CytoScape [22]. 
General gene network analysis pipeline was as follows. First, we 
used the web-based AraNet for the gene network analysis 
according to eighteen GO functional groups, and each of which 
gave ROC (Receiver Operating Characteristic) curve. Based on 
the ROC analyses, 12 GO criteria were further analyzed to select 
seed gene set. Initially, 125 seed genes were selected based on the 
evidence code (http:/ /functionalnet.org/aranet/evidence_code. 



txt) and they were further analyzed against remaining GO groups, 
resulting in a total of 240 seed genes. The 240 gene set was further 
used to obtain additional protein-to-protein interaction (PPI) 
network data by integrating published PPI information [23,24]. 
After removing overlapped portion of these two data sets, 
integrated PPI network was constructed, which was composed of 
2,406 protein nodes and 6325 direct physical interactions. Sub- 
networks predicted to be associated with the abiotic stresses were 
further analyzed and selected from the integrated network by 
extracting gene and PPI networks shared between these two 
datasets. We further elaborated the selected sub-networks using 
the Cytoscape program and combined them with the information 
on seed genes and their corresponding phenotypes. 

Results 

Collection and identification of abiotic stress-responsive 
genes 

To translate ABS-responsive genes across legume species, we 
used Arabidopsis as the central anchor model system, because of 
its richest contents of gene information among other plant model 
systems, to coiiiiect correspotiding information with M. truncatula, 
the representative model legume species. In addition, dought 
stress-responsive genes identified by Affymetrix DNA microarray 
experiment in Grapevine {Vitis sinifera) [25] were translated into 
Arabidopsis orthologous genes, with the intent of initially 
centralizing all corresponding information into a single model 
system. TAIR database and AtGI were used as major resources for 
identification of ABS-related genes. Promoter-targeted data were 
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also included [26]. As a result, a total of 1377 genes (1.4K gene set) 
associated with abiotic stresses were finally collected (Table SI) 
and classified according to the gene ontology (GO) biological 
process criteria. Eighteen GO terms were carefiiUy selected to best 
describe the fianctional criteria of collected gene set, including cell 
homeostasis, regulation of transcription, signal transduction and so 
forth (Figure 2). Of these GO criteria, it is noticeable that stress/ 
defense-responsive genes (G06) took a relatively large portion, 
approximately 13.4%, of entire GO-based classification of the 
genes. In addition, transcription factors (G03, 196 genes) and 
signal transduction-involved proteins (G04, 110 genes), whose 
functions generally act at higher level of regulation in cellular 
process and may be collectively called transcriptional regulator 
proteins, occupied 22.2% in total, which is a largest portion 
amongst all the GO groups. This group of genes may act as central 
modulators in adaptation under ever-changing environmental 
conditions and the result indicates that many of them should 
significantiy be involved in responses to the abiotic stresses. The 
MYB family proteins (55 genes) predominated the G03 followed 
by zinc finger (31 genes) and NAG (NAM/ATAF/CUC, 14 genes) 
transcription factor family proteins (Table SI & S2). Many of them 
contained genes previously well-known as responsive to abiotic 
stresses, of which, for example, included DREB/ CBF (dehydra- 
tion-responsive element-binding/C-repeat and DRE binding 
factor, 13 genes), AREB or ABF/ABI (ABA-responsive element- 
binding/ ABA-insensitive protein, 5 genes), ANAC019 
(Atlg52890; positive regulator of ABA signaling) [27], ICEl 
(At3g26744; inducer of CBF expression 1) [28] and many other 
genes (Table SI). 

To see how much portion of genes operate within the interface 
of different types of abiotic stresses or share their pathways, the 
1.4K gene set was analyzed according to their expression cues 
(Figure 3). Out of three representative abiotic stresses, drought 
assumed the largest portion with a total of 8 1 1 genes followed by 
salt (434 genes) and cold stresses (300 genes). 128 Genes in the 
miscellaneous category include some indistinct, but potentially 
implicated with ABS response, descriptions for the expression cues 
to be classified into any of three criteria, such as "ABA-responsive" 
and "ROS-induced", implying that they would be reclassified into 
any distinct groups in the future. It is noteworthy that a total of 
185 genes appeared to respond against more than two different 



Drought 




Figure 3. Distribution of ABS-responsive genes among three 
representative abiotic stresses, drought, cold and salinity. 

Expression cues for individual genes were determined by experimental 
platforms of relevant literature and the description of corresponding 
genes in the Arabidopsis database. Genes in the miscellaneous category 
usually had broad or ambiguous functional descriptions, such as abiotic 
stresses, ROS-responsive and ABA-induced. 
doi:1 0.1 371/journal.pone.0091 721 .g003 

abiotic stresses. Of these genes, 1 12 genes seemed to respond in all 
the three different types of stresses, indicating that those genes 
and/ or proteins involved in ABS responses share their pathways in 
certain interactive manner. 

A total of twenty resources were employed to make the genome- 
wide collection of ABS-related genes, including 2 databases and 18 
research articles (Table SI). Within the 1.4K gene set, 188 genes 
were hit by more than two references, which may implicate their 
functional involvement in ABS responses. These genes were 
further organized according to their GO (Figure SI). Interestingly, 
genes belonging to three GO classes (i.e., G03/4/6), collectively 
called proteins associated with expressional regulation (GO 3 and 
G04) in response to abiotic stresses (G06), occupied almost the 
half portion (86 genes, 45.8%) of the multi-hit genes (Figure SI, 
Table S3), which provide a major portion, but not all, of genes 
involved in both upstream and downstream of the ABS-responsive 
gene networks. This gene set appears to contain many functionally 
important proteins, for example RD29A/LTI78 (At5g52310; 
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Figure 2. Classification of ABS-related genes according to 18 biological process gene ontology (GO) criteria. 

doi:1 0.1 371/journal.pone.0091 721 .g002 
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responsive to dehydration/low temperature-induced; hit by 10 
references), RD26/ANAC072 (At4g27410; hit by 8 references), 
RD20 (At2g33380), COR15a and COR15b (At2g42540 and 
At2g4253(), respectively; cold-regulated proteins) and many 
others. Of those 86 genes, 22 genes were shown to be induced 
by all three abiotic stresses, in some cases even by pathogen 
infection (Table S3), reconfirming the fact, as already shown in the 
Figure 3, that drought/ cold/salt stresses dynamically interact with 
each other and potentially even with pathogenic processes. 

Identification of cross-family orthologous genes and 
translational efficiency of genomic information 

Accuracy of identifying orthologs within the evolutionary 
context is crucial in translating genomic information between 
different species. For purposes of cross-family transfer of genomic 
information, we used a combination of parameters and tools by 
taking an integrative consideration of E-value and bit-score (i.e., a 
total length of corresponding alignment) of the tBlastX/BlastP- 
derived alignments. Final decision for the t ross-species ortholog)' 
was carefully made by scrutinizing resulting alignments along with 
investigator's comments. Firstly, we dividcxi the translated g(;n(;s 
according to the E-values, because it is a fundamental parameter 
of the first priority to be considered for the orthologous gene 
prediction, with regular intervals (Figure 4A). Of these, 352 genes 
(25.6%) with the value of E = 0 took the largest portion followed by 
307 genes with 0<E<E"""', 123 genes with E"""'<E<E"'"' and 
150 genes with E"''"<E<E'^" of E value intervals. Taken together, 
a total of 932 genes (67.7%), although other genes above these E 
value intervals were not necessarily excluded from the ortholog 
criteria, showed reasonably significant E value-based prediction of 
orthology. Occasionally, E value-based identification of orthologs, 
in some sense, by an automatic prediction through bioinformatics 
data processing might be prone to errors mainly due to the 
complexity of genome, especially such as protein families resulting 
from gene duplications and homeologous chromosomes by 
polyploicUzation. Thus, we employed an additional means to 
ensure more accurate identification of orthologs by remarking 
and/or giving comments on actual tBlastX alignment outcomes 
(see legend of Figure 4 for the detailed information of remarks). 
The remarks fall into nine criteria (Figure 4). With the alignment 
remarks, a total of 829 genes (60.2%) were predicted as tentative 
orthologs above the level of FS (fairly specific) homolog search 
criterion (data not shown). To more accurately confirm the 
orthology of homologous candidates, genes with the value of <E' 
and remarks of UC (uncertain) /NA (not available) were 
removed and reanalyzed according to the alignment remarks, 
thereby resulting in a total of 1 134 genes (82.4% of 1.4K gene set; 
Figure 4B). Of these, a total of 797 genes (70.3% of 1134 genes; 
57.9% of 1377 genes) were determined potentially to be 
orthologous between these two model systems, which seems 
reasonable for the cross-family prediction of orthologous genes. In 
addition, it is noteworthy that the removal of genes with less 
significant E-values substantially contributed to increasing the 
probability, which increased from 60.2% to 70.3%, of finding 
orthologous counterparts between these two species (Figure 4B). 
These analyses also showed that the removal of genes with the E- 
values of low significance did not gready affected tlu' orthology 
prediction of genes above FS level of the remarks, by which only 
32 genes (3.8% of 829 genes) were excluded, while below this 
remark level, 78 genes (18.8% of 415 genes without UC/NA 
criteria) were removed (Figure 4B). This result indicates that 
proper use of the E-value criteria is obviously useful for the 
translation of genomic information, but it has limitations to the 
accuracy in identifying orthologous genes. Therefore, it is 



recomm(;nd("d that a combination of approaches, including the 
E-value, total length of afignment and up-close investigation of 
individual alignments, are necessary to ensure cross-species 
translation of genomic information. The calculated GO-based 
translational efficiencies appeared various according to different 
GOs, ranging approximately from 60% (GO 12 & 16) to 96% 
(GO 11) (data not shown). However, it looked obvious that low 
translation rates were always seen in the case where the gene 
numbers of corresponding GO were low (for example G012/16), 
which may be more sensitively affected by counting predicted 
orthologs. Taken together, this implicates that translation of 
genomic information is not significandy affected by different 
functional criteria of genes. 

Genomic positional translation and integration of the 
comparative map with a gene sub-network 

Comparative genomic analysis is an essential and efficient tool 
for translating genomic contexts of one species into other different, 
but taxonomically related, species. Comparative translation of 
genomic information is especially useful to save the research 
resources, not only because it enables to gain insights into genome- 
wide relationships among multiple genomes under comparison, 
but also because the translated information can be effectively 
applied to other organisms of researcher's interest. For purposes of 
interconnecting genomic information between Arabidopsis and M. 
truncatula, which represent two plant families of agricultural 
importance, loci of each predicted orthologous genes were 
positioned, based on the fully sequenced genomic locus informa- 
tion, in both model genomes. An in-house user-friendly platform 
for the comparative analysis was developed (relevant manuscript 
being prepared separately) and used to produce the comparative 
maps (Figure 5 and Figure S2). 

Each of inferred orthologous gene loci were translated between 
the two genomes and analyzed for syntenic relationships. To avoid 
excessive complexity of resulting comparative maps, we conducted 
the analyses by translating locus information from single chromo- 
some of one species to entire chromosomes of another compared 
species. Figure S2 A&B demonstrate Arabidopsis-centered com- 
parison and Medicago-centered analysis, respectively. With 1377 
gene loci, only 4.2% of 33,198 Arabidopsis annotated gene models 
[7], associated potentially with the ABS responses, we couldn't find 
extensive synteny blocks between these two species. The compar- 
ative analysis revealed only fragmental colinearity with a total of 
37 synteny blocks (Figure S2 C, Table S4). The segmental syntenic 
relationship c:ould readily be expected because two plant families 
of the Fabaceae and Brassicaceae, to which two model systems 
belong, diverged at the ancient time of approximately 125~136 
MYA (Figure 1) and consequendy their genomes should have 
experienced a diverse array of evolutionary events. The numbers 
of synteny blocks on each Arabidopsis and M. truncatula 
chromosome ranged from 5 to 9 and from 2 to 9, respectively 
(Figure S2 A&B), whUe no synteny blocks were found in MtChr6 
(Figure S2B). To see how genomic locations of ABS genes were 
correlated with their functional specialization, we counted ABS 
genes in each chromosome according to different GO criteria 
(Figure 5 and Table S5). Based on this analysis, although the data 
is not fully supportive, it seems, in general, unlikely that there is a 
significant correlation between the conserved synteny and 
functional organization of ABS-responsive genes within certain 
genomic regions. However, we could observe that relatively higher 
number of ABS genes resided in AtChr 1&5 and MtChr 4&5, 
which might implicate these genomic regions were relatively better 
equipped with ABS responses. Of these four chromosomes, 
interestingly, MtChr5 was most populated with ABS genes and 
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Figure 4. Cross-family translational efficiency of ABS-responsive gens between Arabidopsis and M. truncatula. A. Prediction of cross- 
species orthologous genes according to tine Expect values. Bars indicate the number of genes, while the solid lines the percentage of translated 
genes. B. Distribution of translated genes with the E-value of <E"^'' according to visual inspection of the tBIastX alignments: UQ, unique; VS, very 
specific; S, specific; FS, fairly specific; LS, less specific; SIVI, specific but multiple homologs; M, multiple homologs. The numbers in outer circle denote 
the number of genes, while the numbers of inner circle denote corresponding percentage. 
doi:1 0.1 371/journal.pone.0091 721 .g004 



showed the highest counts of ABS genes in the G03/4/6 (Table 
S5), which are relatively more important functional criteria in the 
aspect of ABS responsiveness. On the other hand, it is interesting 
to note that any single synteny block was not found in the MtChr6, 
in which 52 ABS genes, relatively smaller number than in other 
chromosomes, were contained. This observation seems consistent 
with previous reports on comparative analysis done with limited 
number of genetic markers [29] and recent M. truncatula whole 
genome analysis [13] that MtChrG is rich in heterochromatic 
DNA and relatively poor in expressed genes [12]. Furthermore, it 
has been known that the majority of NBS-LRR resistance 
analogous genes were localized as clusters in the MtChr6 [30], 
indicating that the MtChrG is functionally specialized for the 
resistance mechanism against pathogen infections. Likewise, 
functionally specialized genomic regions for the ABS responses 
might exist and remain to be further elucidated with more 
extended data in the future. 

A total of eight tandem duplication, two in Arabidopsis (ERD6 
and glutathione S-transferase genes) and six in M. truncatula 
(ubiquitin-conjugating enzyme, sarcosine oxidase and six more), 
were found within the 37 synteny blocks (for detail see Table S4). 
On the other hand, genome-wide analyses of ABS genes resulted 
in identification of 49 tandemly duplicated genomic sites. Of these, 
24 sites were tandemly duplicated only in Arabidopsis, 1 3 only in 
M. truncatula and 8 in both genomes (Table S6). It appears that 
genes tandemly duplicated in both genomes include mainly 
effector proteins by which exert their functions to recover the 
balance in homeostasis, with sufficient copy numbers, against the 
stresses. For example, these included glutathione S-transferase, 
plasma membrane intrinsic proteins (RD28), cytochrome P450 
monooxygenase and many others (for more detail see Table S6). 
Interestingly, five copies of Na''"/H''"-exchanger, which is an 
important protein in regulating osmotic homeostasis, are shown 
to be tandemely dupficated in both MtChr4 and MtChr5. 

Function of genes involved in certain biological responses and/ 
or metabolisms can be better triangulated when component genes 
are considered within the context of functional gene networks. 
Subsequently, as a prelude of the comparative functional genomic 
analysis, we attempted to integrate one of gene sub-networks, in 



which all interactions among genes were supported by physical 
protein-to-protein interactions [23], with corresponding compar- 
ative map (Figure 6). 

The sub-network presented in Figure 6 elucidates a central 
ABA-mediated pathway in response to abiotic stresses, in which 
harbors mostly known core components of the pathway from the 
upstream, for example soluble ABA receptors (PYR/PYL/RCAR: 
pyrabactin resistance/PYR-like protein/regulatory component of 
ABA receptor), to the downstream, for example, two potassium 
channels (KTl cSc KT2-3). Also, the network demonstrates 
correlations and/or connections between these components, 
including a series of ceUular processes occurring upon stress such 
as perception, signal transduction, transcriptional regulation and 
expression of effector genes. The sub-network is composed of a 
total of 36 genes and can be further classified three groups: (i) 
perception and signaling-related sub-network; three ABA recep- 
tors, two SnRK2s (subfamily 2 snfl -related kinase), five PP2Cs 
(type 2C protein phosphatase); (ii) calcium-mediated signal 
transduction; one CaM (calmodulin), two CBLs (calcineurin B- 
like proteins) and nine CIPKs (CBL-interacting serine/threonine 
protein kinase); (iii) transcriptional regulation; six ABRE/ABF 
bZIP transcription factors and three AFPs (ABI five binding 
proteins, also known as Ninja-famUy protein, by which act as a 
negative regulator of bZIP TFs). In addition, five downstream 
effector genes are connected with the network: two potassium 
channels (KTl & KT2-3), catalase (CAT2), H^'-ATPase (HA2) 
and 1 -amiocyclopropane- 1 -carboxylase synthase (ACS6; involved 
in ethylene synthesis) (Table 1, Table S7). 

There have recently been two important breakthroughs in ABA 
signafing model: (i) the discovery of soluble ABA receptor [31,32]; 
(ii) the identification of PP2C-SnRK complex, negative and 
positive regulators of ABA receptor respectively, as a downstream 
component of ABA receptor complex [33,34]. Due to these two 
important fmdings in 2009, ABA signaling model has been 
dramaticaUy figured out that the PYR/PYL/RCAR-PP2C- 
SnRK2 complex plays the central role in response to abiotic 
stresses, such as drought and salinity [35] . In Arabidopsis, a total 
of 14 PYR/PYL/RCAR ABA receptor homologs have been 
identified [35]. Of these, three PYR/PYL/RCAR proteins 
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Figure 5. Cross-family locus translation of ABS-responsive genes through comparative analyses according to gene ontologies. Eight 
different colors, for M. truncatula chromosomes, were used to distinguish different chromosomal locations; IVltlG-red, Mt2G-green, iVlt3G-deep blue, 
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(At2g40330, At2g38310, At4gl7870) are involved as members of 
the sub-network, in which the ABA receptors are shown to interact 
with three PP2Cs (Atlg72770; known as HABl for homolog to 
ABIl, At3gll410, At4g26080; known as ABIl) and two SnRKs 
(At3g50500, At4g33950; also known as OSTl for open stomata 1) 
(Figure 6). This tripartite ABA signaling core complex is shown to 
associate with the calcium-mediated regulation of gene expression. 
With regard to this, CBL and CIPK interactions play a major role 
in the downstream signaling. In Arabidopsis, 10 CBLs and 26 
CIPKs constitute a complicated interaction network with some 
extent of functional redundancy [36]. Of these, three Ca^"*" sensors 
and eight CIPKs participated in the sub-network, in which a 
diverse array of interactions between sensors such as CBLs and 
CIPKs were apparent (Figure 6) and well documented in genetic 
studies of plant's responses to the salinity, for example SOS (salt 
overly sensitive) pathway [37]. The same research group reported 
that SOS2 (a type of CIPKs) forms a complex with SOS3 (a type 
of CBL proteins), and lead to activation of SOSl (a Na"'"/H"'" 
antiporter) important for ion homeostasis and salt tolerance. In 
this SOS pathway, ABI2 (a type of PP2C) act as a negative 
regulator by deactivating SOS2 or dephosphorylating SOSl [38]. 



The network seems to fairly weU represent the CBL-CIPK-PP2C 
interactions among genes involved in the Ca^"''-mediated signaling, 
although their modes of action, such as activation and deactiva- 
tion, are not denoted. Within the network, it seems likely that the 
secondary signal of calcium ion is transduced mainly through 
CPK32, a calcium-dependent protein kinase, to AREB/ABF bZIP 
transcription factors. It was experimentally verified that AtCPK32 
is involved in Ca^'''-mediated transduction of ABS signals by 
interacting with ABF4, a transcriptional regulator [39] . In addition 
to ABF4, five more transcription factors (i.e., ABF2/3, AREB3 
and ABI3/5, which are all bZIP TFs except for ABI3) constitute 
positive regulatory part of the TF sub-network. In contrast, three 
negative regulators, which are named as AFP (i.e., ABI five 
binding proteins), take parts in the transcriptional regulatory 
network (Figure 6). It is currendy believed that AFPs are needed to 
fine-tune stress responses by interacting with ABI5 and/ or ABI,5- 
related bZIP TFs [40]. 

On the other hand, ion transporters play crucial roles especially 
in maintaining ion homeostasis through osmoregnlation. Among 
others, potassium ion is the most important and abundant cation, 
and subsequendy its corresponding channels or transporters may 
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Figure 6. An example of integrative analysis for a core ABS regulatory sub-network and corresponding locus-translated 
comparative map. The sub-network was analyzed and reconstructed based on the Arabidopsis protein-to-protein interactome data. Availability of 
phenotypic information (for detail, see Table S7) and chromosome locations are denoted by a combination of different colors and shapes. Loci of 
individual genes within the sub-network are connected to corresponding positions on the Arabidopsis chromosomes. Detailed information of sub- 
network component genes is available in Table 1. 
doi:1 0.1 371 /journal.pone.0091 721 .g006 



essentially affect the responses to abiotic stresses. Two potassium 
channels/transporters (KTs: At2g26650, At4g22200) are involved 
in the gene network in conjunction with the CBL-CIPK signaling 
pathway (Figure 6). Interestingly, it has recently been suggested 
that CBLs and CIPKs may compete with each other to bind with 
the channel and act in antagonistic manner, in which CIPK is a 
positive modulator and vice versa, to modulate KT's activity [41]. 
It has also recently been validated that HA2 (Atlg 07430; a ABA- 
induced and seed specific PP2C, also known as HONSU) directly 
interacts with KTl and regulates seed dormancy by acting as a 
negative modulator of ABA signaling [42]. 

Taken together, component genes within the chosen PPI- 
supported subnetwork appear to be well aligned not only with 
previously known functional studies, but also with interactome 
data [23], offering a nice case study of network-based translational 
genomic approaches. This type of genomic translation integrated 
with functional gene networks may facilitate more accurate 
transfer of genomic information between different species and 
can be apphed to systematic functional studies of genes. 



Discussion 

Whole genome sequences and complete gene sets within the 
genomic contexts, which have recently become available in many 
plant species, can provide broad insights into how genes and 
genomic structures are organized and have changed during the 
processes of adaptation to fit ever changing environment. 
Therefore, it is expected that the translational genomics, if one 
properly utilizes such approaches, may play critical roles, based on 
genome-wide understanding, to leverage some key areas, such as 
basic plant biology and development of more efficient selection 
processes for molecular breeding, ultimately towards the crop 
improvement. Genomic responses to abiotic stress involve, in both 
quantitative and qualitative senses, complex traits, which are 
determined by interactions between multiple genes. In this study, 
we intended to establish and provide a genome-wide hst of ABS- 
responsive genes for the purpose of building up a basic resource for 
the cross-species translational genomics, between Arabidopsis and 
M. truncatula, and further gaining insight into key components 
governing the ABS-responses in plants. 

Validation of homologous gene's orthology is the prerequisite 
for the comparative genomic translation, because multi-copy 
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genes, for example family proteins, may frequentiy lead to 
erroneous transfer of genomic information. However, accurate 
prediction of the gene's orthology between different species is not 
straightforward largely due to complex history of speciation and 
duplication. A substantial effort has been made to transfer 
orthologous functional annotation of genes between different 
genomes. For example, EnsemblCompara Gene Trees [43], 
specialized only for vertebrates, and InParanoid [44,45], devel- 
oped mainly for eukaryotic species, employed phylogenetic 
approaches to resolve the complexity of ortholog identification, 
using TreeBeST method and two-pass BLAST, respectively. 
Particularly in plants, Arabidopsis played a central role in 
organizing functional annotation of genes in association with the 
Gene Ontology Consortium, and each of annotations were defined 
by 1 1 evidence codes, for typical examples ISS for "inferred from 
sequence similarity" and IPI for "inferred from physical interac- 
tion" [46]. The functional annotations of the gene in correspond- 
ing TAIR database (http://www.arabidopsis.org) is being contin- 
uously updated by integrating experimental data and literature- 
based information. Phytozome (http://www.phytozome.net) is 
one of the representative comparative genomics platforms for 
plants [47]. The database contains information of 25 completed 
genomic sequences and employs the common BLAST or BLAT 
(BLAST-Iike alignment tool) to perform the similarity searches. In 
contrast, GreenPhylDB (http://greenphyl.cirad.fr) provides com- 
parative analysis platform specialized only for two representative 
plant species, Arabidopsis and rice, with the emphasis on 
phylogenomic approach for the family proteins [48]. On the 
other hand, Abrouk et al. [7] emphasized three components to 
ensure orthologous relationships between genes: (i) comparison of 
inter-specific coding sequence, (ii) similarity based on cumulative 
identity percentage (CIP), (iii) relative length of the sequences 
represented by cumulative alignment length percentage (CALP). 

Although we did not develop or use an automated ortholog 
detection tool equipped with mathematically programmed algo- 
rithms, we tried to meet above orthology criteria, and subse- 
quently orthologous genes were finally determined by inspecting 
individual alignments derived from the tBlastX/BlastP with highly 
trained human eyes and integrated knowledge on similarity 
searches. We also used our own multi-layered denotation on the 
sequence alignments from "unique" (for clear orthologous 
relationship; i.e., one-to-one relationship) to "multiple" (for 
multi-gene family and/or obscure orthology) (Figure 4). We 
imagined that the denotation would be especially useful when fully 
sequenced genomes, because they represented entirety of genomic 
context, were compared to translate the genomic information. 
Taken together, reliable range of translation rate was estimated 
approximately 60% (Figure 4) between Arabidopsis and M. 
truncatula. This translation rate is seemingly low, but probably 
reasonable if one considers their divergence time of 125~136 
MYA (Figure 1). This estimation could be applied when 
researchers intend to transfer genomic information between two 
plant species with similar divergence time. 

In nature, plants are simultaneously exposed to multiple stresses, 
including abiotic and biotic stresses, and the stresses often interact 
with each other. Although many genes known to be involved in the 
ABS responses have been identified and elucidated within the 
context of gene interaction network [49] , little has, in reality, been 
known about how these genes are interwoven within the network 
and interact between different abiotic stresses. Thus it would be 
critical to discover key cross-stress interplayers towards fuU 
understanding of orchestrated networks composed of ABS genes. 
In this study, a total of 185 (13.4% of 1.4K gene set) genes were 
identified as potential candidates for such interplaying components 
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(Figure 3), although they could not be further analyzed towards the 
end of interactome analysis due to lack of information. However, 
we expect that the area or number of genes within the interacting 
interface of these three abiotic stresses would probably increase as 
more data and experimental evidences accumulate. Intriguingly, 
several recent reports suggest that the ABA, previously known as a 
central plant hormone regulating ABS responses, also play 
important roles in the interaction between abiotic stresses and 
pathogenesis [49] . One interesting example was shown that ABA 
was also involved in the defense response by closing stomata upon 
pathogen attack [,50] , thereby implicating that ABA plays its role 
against pathogen infection in more direct way than previously 
expected. As seen, a broader array of interactions between 
multiple stresses, including ABS and even biotic stresses, is 
becoming an emerging topic in this field of researches. Therefore, 
further analyses towards this direction are in great demand to shed 
more light on comprehensive understanding of network-based 
ABS responses, even with biotic stress responses. 

Because ABS responses are QTL traits, extremely complex and 
thereby numerous genes are involved, the list of 1377 genes 
collected in this study seems large enough to explain full scenario 
of responses against abiotic stresses. However, the gene set is 
probably heterogeneous, because some portion of genes are 
perhaps be involved in general physiological responses, which 
means that individual genes may have differential significance in 
their functions. Various functions of genes commonly constitute 
networks within the hierarchical context at different levels on 
which they mainly act in the courses of reactions against given 
external stimuli. Thus, it is important to triangulate key genes from 
the central networks within which govern corresponding respons- 
es. In order for the comparatively translated genomic data to be 
more valuable, functional genomic data need to be integrated. For 
purposes of integrating these two different types of data, gene 
network and/ or interactome analyses with the fuU set of 1.4K 
genes are currentiy under way (relevant manuscript will be 
prepared separately). 

In addition to the functional aspect of the sub-network 
(Figure 6), we could observe some features of genomic organiza- 
tion of the sub-network genes. For example, we could find a couple 
of regions in which genes are localized in relatively closer 
proximity: (i) CIPK2/5/15 in terminal region of AtGhr5, (ii) 
CPK32, AREB3 and SnRK2.2 in AtChr3, (iii) PYL4/6, TCH3 
and ABI5 in AtChr2, (iv) CIPK3, SIP4 and KTl in AtChr2 
(Figure 6 & Table 1). It would be interesting to see that the AtChr4 
is most populated or organized with genes from the broadest 
functional criteria of the gene network, which might be suggestive 
of structural organization, to some extent, for the functional 
speciaUzation in ABS responses. However, the idea seems 
premature, remain elusive and need to be further assessed with 
more integrated data of genomic organization in conjunction with 
more extended gene network analyses. 

As exemplified in Figure 6, the sub-network for ABA-triggered 
stress responses, although reconstructed only by using bioinfor- 
matic tools and/or resources, appeared to be well aligned with 
previously reported protein-to-protein interaction data [23,49], 
and may play a central role towards further expansion of the 
network by integrating with other pathways, for example ethylene 
pathway (e.g., ACS6 in the sub-network) and ROS pathway (e.g., 
CAT2 in the sub-network). The subnetwork-associated compar- 
ative translation of genomic information presented in this study 
(Figure 6) was only a small example as a prelude towards a genome 
scale functional comparative analysis. The 1.4K gene set-wide 
gene /interactome network analysis is currentiy underway with 
necessary update of data and/or information, and the results will 



be integrated with structural genomic data, in the near future, with 
an ultimate goal of gaining broader and deeper insights towards 
understanding the whole picture of complex interactions of genes 
involved in abiotic stresses. This way, one would be able to better 
identify key components associated with tolerance mechanisms 
against ABS-derived challenges and can apply the gained 
knowledge for the molecular breeding programs. However, 
species-to-species translation of genomic information may not be 
such a straightforward process, where the situation is more crucial 
for complex traits such as the abiotic stresses, that one need to take 
a careful caution to enhance accuracy and/or quality of the 
information transfer. In leguminous plants, for example, such 
genomic translation was tried to discover core genes responsive to 
salt stress, and successfully done using a combination of 
transcriptomic and metabolomics data with six Lotus species 
[51]. In this particular example, we need to notice that they 
utihzed translational genomic approaches within only closely 
related model and crop species from the same genus. On the other 
hand, even cross eudicot-to-monocot genomic translation was 
conducted with 4,630 orthologous gene pairs identified between 
Arabidopsis and rice [11]. Corresponding results showed highly 
conserved coexpression pattern of 77%, consequently indicating a 
broad range of applicability of translational genomics, when 
properly appUed to species of interest. 

Translational genomics, from a practical point of view, seems 
likely still in its infancy. It still has considerable obstacles to be 
overcome, such as development of more accurate ortholog finding 
algorithms, comprehensive curation of functional genomic data 
and reliable integration of large scale omics data. Despite all these 
obstacles, it seems most likely that translational genomics should 
be a strong tool to overcome the barriers in translating genomic 
context among different species. Towards this direction, the data 
gained from translation of genomic information could be utilized 
to identify trait-associated genes, and subsequently exploit marker 
discovery and development for the purpose of molecular breeding. 

Since ABS-related genomic information has been translated 
into the legume model system M. truncatula in this study, it would 
readily become possible to conduct pan-legume translation of 
genomic information on ABS genes across fully sequenced legume 
species. Until now, five legume species have been fully sequenced 
and their genomic information is available in pubUc domains. This 
situation may provide a great opportunity for translating 
functional discoveries in model systems into many other legume 
crops, which are relatively poor in genomic information, through 
breeding programs and molecular biotechnology. Furthermore, 
similar approaches could be applied in other functional criteria, 
which are of great biological importance in legumes, such as 
symbiotic nitrogen fixation, seed maturation and physiology, and 
secondary metabolisms. However, the efficiency and/or predict- 
ability of the cross-species translational genomic applications 
remain to be further refined so that individual genes and/ or gene 
networks would functionally work properly for the complex QTL- 
associated traits such as the abiotic stresses. Nevertheless, we 
anticipate that our study would provide rich information on ABS- 
responsive genes and potentially contribute to identifying key 
genes and developing trait-associated molecular genetic markers 
towards the improvement of important crop quality related to the 
abiotic stress tolerance. 

Supporting Information 

Figure SI GO-based classification of genes hit by 
multiple resources. Three GO classes (G03/4/6), which 
occupy largest portion and are associated with transcriptional 
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regulation and response to abiotic stresses, are demonstrated 

separately using pie chart. 

(PPTX) 

Figure S2 Locus translation and synteny analysis be- 
tween Arabidopsis and M. truncatula. A. Comparative 
analyses of single Arabidopsis chromosome-to-M truncatula whole 
genome. Each of five Arabidopsis chromosomes was separately, to 
avoid excessive complexity, analyzed with eight chromosomes of 
M. truncatula. Corresponding syntenic comparative analyses are 
shown side-by-side. B. Comparative analyses of single M. truncatula 
chromosome-to-Arabidopsis whole genome. Each of eight chro- 
mosomes of M. truncatula was separately analyzed with five 
Arabidopsis chromosomes. Corresponding syntenic comparative 
analyses are shown side-by-side. C. Integrated comparative map 
and synteny analysis of translated ABS-responsive gene loci. 
(PPTX) 

Table SI Genome-wide list of genes associated with 
abiotic stress responses. 

PCLSX) 

Table S2 Classification of transcription factors (G03) 
and signaling-associated proteins (G04). 

PCLSX) 
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